End-to-end model-based trajectory prediction for ro-ro ship route using dual-attention mechanism

With the rapid increase of economic globalization, the significant expansion of shipping volume has resulted in shipping route congestion, causing the necessity of trajectory prediction for effective service and efficient management. While trajectory prediction can achieve a relatively high level of accuracy, the performance and generalization of prediction models remain critical bottlenecks. Therefore, this article proposes a dual-attention (DA) based end-to-end (E2E) neural network (DAE2ENet) for trajectory prediction. In the E2E structure, long short-term memory (LSTM) units are included for the task of pursuing sequential trajectory data from the encoder layer to the decoder layer. In DA mechanisms, global attention is introduced between the encoder and decoder layers to facilitate interactions between input and output trajectory sequences, and multi-head self-attention is utilized to extract sequential features from the input trajectory. In experiments, we use a ro-ro ship with a fixed navigation route as a case study. Compared with baseline models and benchmark neural networks, DAE2ENet can obtain higher performance on trajectory prediction, and better validation of environmental factors on ship navigation.


Introduction
It is crucial to obtain dynamic information and data on ship navigation, so as to provide trajectory predictions and develop security-friendly programs and countermeasures for ships' intelligent navigation systems (Lehtola et al., 2019).Currently, navigation monitoring information in the shipping process is mainly obtained via the automatic identification system (AIS), which can record the ship number, navigation position, speed, course, and other information.AIS data can provide reliable data support for research and analysis such as maritime traffic analysis, trajectory prediction, and route planning (Zhe et al., 2020;Li et al., 2023).To predict trajectories from the perspective of ship navigation, researchers often use kinematic modeling-based methods, such as the Kalman filter, nearly constant velocity, Bayesian model, and Gaussian-sum filter, which have made good achievements in ship trajectory prediction (Mazzarella et al., 2015;Enrica et al., 2018;Baichen et al., 2019;Rong et al., 2019).The characteristics of these methods make them more suitable for ships sailing in a relatively stable environment.Navigating ships is typically impacted by several geographical conditions, which require consideration of historical data for training prediction models to enhance generalization (Gao et al., 2021).For this situation, machine learning techniques can provide higher prediction accuracy and better generalization ability compared with kinematic methods.Classic machine-learning models have been extensively utilized in the realm of ship trajectory prediction, such as logistic regression (LR) (Sheng et al., 2017), support vector machines (SVM) (Liu et al., 2019), along with many kinds of neural networks (NN) of multi-layer perceptron (MLP) (Valsamis et al., 2017), back-propagation NN (BPNN) (Simsir and Ertugrul, 2009), recurrent neural network (RNN) (Cho et al., 2014;Capobianco et al., 2021), and long short-term memory (LSTM) (Ma et al., 2022;Tang et al., 2022).
However, classic NNs lack a mechanism that can effectively mine information between sequences, so they have obvious limitations when dealing with sequential prediction problems (Zhang et al., 2022).RNN and LSTM can process sequential information, which controls the transmission of information flow through a network by adding gated mechanisms (Schmidhuber and Hochreiter, 1997;Cho et al., 2014).Zhao et al. (2023) proposed an RNN-based encoder-decoder model for trajectory prediction during ship encounter situations, where the encoderdecoder model provided improvement for handling sequential information.For this case, attention mechanisms provide a more appropriate solution (Luong et al., 2015).Several researchers have introduced the attention mechanism in the trajectory prediction model (Ma et al., 2020;Liang et al., 2022;Liu et al., 2022).Another group of researchers used attention mechanisms for feature extraction in sequential prediction.In Jiang and Zuo (2023), a multi-class trajectory prediction model was trained using the attention mechanism, and significant predictive ability was achieved in predicting the trajectory sequence.In Chen et al. (2023), an attention mechanism was applied to associate trajectory change trends with ship navigation states, and adaptively update the weighted factors of features to improve prediction accuracy.
After a review of existing studies, this article proposes a dual-attention (DA) based end-to-end (E2E) neural network (DAE2ENet) model for sequential prediction of ship trajectory.There are two mainly improved parts of the DA mechanism and E2E structure.In the E2E structure, we design a parallel network of LSTM units to extract the complex relationship between the historical and current states of ship trajectories.In the DA mechanism, we incorporate two attention mechanisms, namely global attention (GA) and local attention (LA).The GA facilitates the identification of associations between the input and output sequences, which enables the dynamic adjustment of input sequence weights to suit various prediction tasks.The LA is employed for acquiring significant characteristics from the input sequence when generating the output.In comparison experiments, traditional models (e.g., LR, SVM, BPNN), and classic NNs (e.g., RNN, LSTM, Attention) are used as baseline methods.The results show that DAE2ENet improves the accuracy by around 50% compared to the classic NNs in ship trajectory prediction.In ablation experiments, the effect of LSTM, LA, and GA are investigated, where DA can successfully capture the latent information and associations in AIS data sequences to enhance the effectiveness and generalization of trajectory prediction.According to numerical results, DAE2ENet has improved accuracy by around 30% compared to other attention models.
The remaining parts of this article are presented as follows.Section 2 presents the prosed model of dual attentions, LSTM unit, and end-to-end structure.Section 3 presents experimental results, comparisons, and validations.Section 4 presents conclusions and future plans.

Methodology . Variable statement of trajectory prediction
This article aims to predict the navigation position of ship trajectory based on navigating variables (X nav ) and environmental variables (X env ).Data gathering of navigating variables is mostly based on AIS, which includes longitude, latitude, speed, course, and so on.Data gathering of environmental variables is mostly based on sensors, which include wind, propeller pitch, rudder, and so on.Equation ( 1) is a set of ship navigation status and the environmental situation at time t.
The current and historical navigational states have an impact on the position at sea of the ship in the upcoming moments during the sailing process.The sequence of navigation and environmental variables are shown as X nav = {X nav (t), X nav (t−1), . . ., X nav (t−m)} and X env = {X env (t), X env (t − 1), . . ., X env (t − m)}, where m denotes the time step used in the prediction trajectory.To predict the future position of ship trajectory at time t+1, the mathematical expression is formulated as Equation (2), where ŷt+1 denotes predicted position of longitude and latitude, and f (•) denote the predicting function.

. Overview of prediction framework
The proposed prediction framework of ship trajectory consists of three major segments that are shown in the diagram in Figure 1.Module 1 is data processing, which includes data cleaning for data exceptions, duplication, errors, and missing values from raw data.This process also provides training and testing data for  .

Methodological design of DAE ENet
The LSTM-based E2E structure extracts interactional information between sequences efficiently based on inputs.Since it becomes difficult to understand the dependence on information flowing control, the attention mechanism has been used to learn the dependence of input and output information.Therefore, this article proposes dual attention be incorporated into the E2E structure, where global attention is used to capture relationships from input to output, and local multi-head self-attention is used to extract dependent information in the input sequence.Figure 2 shows the visualization of the proposed model.
In the encoder block, we employ a forward network with two parallel LSTM cells in the hidden layer to pursue sequence data in the input layer.After the hidden layer, hidden states are aggregated by global attention, and dependent information with different relevant weights between input sequence and output value.In the decoder block, multi-head self-attention is employed to explore potential relationships among sequences of input information and generate representations of the relevance between input feature vectors.In the output layer, encoder states depending on global attention and local self-attention are concatenated for final output via MLP.In the encoder, the input sequence is given a new shape to the array without changing the data through the reshape operation, as input to the LSTM unit of the decoder.In the decoder, Add&Norm is used to add up the inputs and outputs for the multi-head attention mechanism and perform layer normalization operations.After Add&Norm, fully connected layer is to map the features extracted by the multi-head attention mechanism to the final output space as shown in Figure 2.

. . LSTM-based E E
The DAE2ENet reconstructs the decoder based on the LSTM-based E2E structure and combines it with dual attention mechanisms.Figure 2 shows the overall structure of the DAE2ENet model, and Figure 3 shows the structure of LSTM-based E2E as well as the operational structure of the LSTM cell.During the whole information flow of the encoding block, the LSTM transfers the input sequence into vector representation according to forward direction.LSTM is an RNN based on a gating strategy.It can effectively solve information loss caused by gradient vanishing in traditional RNNs.
• x t represents the input sequence, which is given in Equation (1).• h t and h ′ t represent the hidden state of the LSTM cell.• ŷt+1 represents the output sequence, which is given in Equation ( 2).• The symbol σ refers to sigmoid activation function.
• C t represents the cell state of the LSTM cell.
The calculation process for three gate mechanisms is listed as follows.
• Forget gate calculates the information that needs to be forgotten at time t, using the previous hidden state h t−1 , previous cell state C t−1 , and current input x t .f t denotes state of forget gate as Equation (3).
• Input gate calculates the information that needs to be transferred at time t.In the LSTM cell, there are two input gates.The first gate uses the sigmoid function in Equation ( 4) to map the states h t−1 and x t .The second gate also obtains the state from h t−1 and x t , which uses the tanh function as Equation ( 5).
Then, cell state can be obtain according to Equation (9).
• Output gate calculates the current hidden state using the current cell state.Equation (6) shows the calculation formula for transforming state h t−1 and x t into information O t .Equation ( 7) show current hidden state using O t−1 and C t via tanh function.
Finally, the brief output function calculated by the LSTM unit can be depicted as Equations ( 9) and ( 10), where LSTMCell(•) represents a set of calculation rules for each gate mechanism.Notations θ and θ ′ are the set of training parameters, which contain

. . Design of dual-attention mechanism
The attention mechanism has become a standard paradigm in deep learning to solve information overloading and re-allocating problems in sequential models (see Figure 4).An attention mechanism using a key-value pair is included in DAE2ENet, which

FIGURE
The basic calculation process of attention mechanism.
contains three components: query, key, and value.The query and key vectors are calculated through dot-product to obtain the basic attention score between each current q i and different k i , and the softmax function is used to map this score α i .The weight α i and value v i are calculated through multiplication to obtain the final attention score based on weighted summation.The calculations are given in Equations ( 11)-(13).
where D represents the dimension of the query vector, s(q i , k i ) represents the score function for q i and k i .
exp(s(q j , k j )) ( 12) According to Figure 4 and Equation ( 13), the product of the values vector v i and weight α i obtained by Equation ( 12) is the attention value between the query vector and the key vector.
Based on the basic attention mechanism, we design dual attentions in DAE2ENet as shown in Figure 2.For the encoder, all the hidden states are inputted to calculate attention, which is considered global attention (see Figure 5A).The value of global attention is calculated by inputting the states h ′ of the encoder and the decoding state h ′ decoder of the decoder as H.The calculations of A global are given in Equation ( 14) where H q , H k , H v denote the query, key, and value vectors of global attention.For the decoder, multi-head self-attention is adopted to calculate the relationships of the input sequence, which is considered as local attention (see Figure 5B).In multi-head attention calculation, a group of attention vectors Q τ , K τ , V τ can be obtained by input X t , and the header value of head τ is calculated as Equation ( 15) The calculations A local of all headers are concatenated as Equation ( 16), where W MH is used for the weight parameter that can be learned during training, and concat(•) refers to the concatenation function, which is used to connect the outputs of multi-header self-attention.
Finally, the predicted value of ŷt+1 can be obtain by Equation ( 17).
where FCNN denotes fully-connected neural networks, and MLP denotes multi-layer perceptron neural networks.

Numerical experiments . Data description
The primary trajectory of the ship is depicted in Figure 6A.This article obtained the historical navigation trajectory from 15 February 2010 to 13 April 2010, which contains two routes.Route 1 (234 trajectories, shown in Figure 6B) is the main route, and Route 2 (38 trajectories, shown in Figure 6C) is an alternative route for worse weather conditions.The details of trajectory data are collected and displayed in Table 1.
The navigation mode of ships on the same route is consistent.Therefore, the experimental data was randomly divided for both routes 1 and 2, with 60% going toward the training set and 40% set aside for testing purposes.The experimental data comes from AIS data and onboard sensor data, and the status information is shown in Table 2. Numerical experiments discussed here involve two main components.Firstly, the main experiment is to use ship navigation factors for model training and validation.Secondly, in the discussion section, numerical experiments are conducted to explore the impact resulting from environmental factors on predicting ship trajectory.

. Experimental preparation and setting . . Model evaluation criterion
Experimental evaluation is an important component of conducting numerical experiments.During the model training process, we chose mean squared error (MSE) as a means to quantify the disparity between the estimated and observed outcomes, and thereafter adjust the model parameters via the backpropagation method.This helps improve prediction accuracy by minimizing errors compared to the true values, thus achieving the purpose of model training.After the model training is completed, we rely on root mean square error (RMSE) metrics to measure our model   18) and ( 19).
where p indicates sample quantity, ŷi denotes predicted positions and y i represents real navigation positions.

. . Parameter settings of the model
In the experiments, the classic optimization algorithm Adam (Kingma and Ba, 2015) is used to modify the adjustable variables in our model architecture, and the learning rate needs to be determined during operation.The sequence information encoding section of the model is frequently formed by three LSTM network structures.The hidden layer is investigated to find the optimal value within the range of [16,320].The magnitude of this parameter indicates the degree of non-linearity for fitting the model.When it is large, the model exhibits overfitting of the training set.For each epoch, we train 5,120 samples, which is repeated for 2,000 times.Additionally, to prevent overfitting of the model, dropout (Srivastava et al., 2014) and regularization terms were employed during the training process.Through numerous experiments, the optimized ranges, interval granularity, and optimal parameter values of the model were determined in Table 3.

. . Baseline models
• LR is a continuous probability estimation method that can be used to solve regression problems when not compressing nonlinearly with a sigmoid function (Sheng et al., 2017).• SVM determines an optimal kernel function in regression tasks, making the learned function as close as possible to predicting continuous target variables (Liu et al., 2019).• BPNN are prevalent methods of forecasting neural networks with backpropagation (Lehtola et al., 2019).• RNN is a classic neural network for sequential prediction (Capobianco et al., 2021).• LSTM is one of the RNNs incorporating gating mechanisms (Tang et al., 2022).• EncDec-ATTN is an encoder-decoder model including attention mechanism (Capobianco et al., 2021).• DAE2ENet is the proposed method in this article. .

Experimental comparisons and analyses . . Comparison of model performance
In comparison with baseline models, we only use navigation status as input X nav = {X nav (t), X nav (t − 1), . . ., X nav (t − m)}, where X nav (t) = {Lon(t), Lat(t), Spe(t), Cou(t)}.The output is the predicting position of ŷ(t + 1) = {Lon(t + 1), Lat(t + 1)}.The last column of Table 4 shows the optimal parameter values of each model during training.When SVM is used for regression prediction experiments, we chose the Gaussian radial basis (RBF) function as kernel and selected the penalty coefficient c = 2.1 for the objective function and the coefficient gamma = 0.02.For neural network models, the optimal parameter of this column represents the number of hidden layers, number of hidden units, learning rate, and regularization value, respectively.According to the results of Table 4, deep learning methods based on LSTM have achieved better performance compared to traditional methods (such as LR, SVM, and BPNN).On the other hand, models incorporating attention mechanisms performed better than baseline methods.LAE2EDNet and GAE2ENet performed worse than DAE2ENet, which reveals that incorporating the dualattention mechanism boosts overall performance.In addition, compared with DAE2ENet, the variant model DAE2EMLP also has poor performance, indicating that when using LSTM as the encoding and decoding structure, gated mechanisms can more effectively extract information from sequential data.

. . Result of ship trajectory prediction
The visual representation of the experimental results is depicted in Figure 7.Our model of DAE2ENet predicts a route that is consistent with the actual location of Route 1. Figure 7A displays the forecast outcomes of each model, while Figure 7D highlights discrepancies between actual and estimated movement trajectories.The overall comparison shows that predictions of longitude and

. Discussion and implications
During navigation, ships are not only influenced by their navigation factors (X nav ={Lon, Lat, Cou, Spe}), but also affected by environmental factors (X env ={WS, WA, SPP, PPP, PSR, SSR}).In this section, we incorporated environmental factors into DAE2ENet as shown in Table 2. and investigated the effect of environmental factors on our model.The investigation data used four categories and six types of environmental factors.
In this investigation, ship navigation status X nav was added sequentially to the three groups of environmental situations to The bold values mean the best prediction results can be obtained under current conditions.
explore the repercussions of external factors on maritime route predictions.The results are shown in Table 5.When Route 1 is combined with ship navigation X nav and X env ={WS, WA}, the RMSE value is minimal.For Route 2, the RMSE value is minimal without consideration of environmental factors.When combined with propeller pitch X env ={SPP, PPP} and rudder angle X env ={PSR, SSR}, the prediction accuracy of the model decreased.
The findings reveal that in the actual trajectory prediction process, environmental factors are unnecessary to maintain a positive effect on the prediction efficiency.
To further investigate the various impacts of environmental variables regarding the prediction results of two different routes, we discussed the changes in global attention weights of the two different routes when combined with X env ={WS, WA} as shown in Figure 9.In the case of Route 1 (see Figure 9A), the visual results show that the weights are changing at different positions, which helps us to improve the model's ability to predict accurately with consideration of WS and WA factors.In case of Route 2 (see Figure 9B), the distribution of attention weights at different time steps is more focused, which results in better prediction results for the model without considering WA and WS factors.
According to the comparative analysis of two sets of experiments, the attention mechanism can affect the sequence information obtained from the output by adjusting the attention weight, thereby enhancing the sequence of information related to the future and obtaining better prediction results.However, if the feature information of the input data is insufficient or unclear, the attention mechanism might lead the model to concentrate on inaccurate or irrelevant information, leading to a decrease in model performance.
Modules 2 and 3. Module 2 is model building and training, where DAE2ENet is trained by incorporating LSTM-based E2E structure, local attention, and global attention.Module 2 also includes the finetuning process of DAE2ENet parameters based on training data.Module 3 is prediction and validation, which includes comparison experiments with baseline models, and ablation experiments with proposed models.

FIGURE
FIGUREOverview of prediction framework for ship trajectory.

FIGURE
FIGURENetwork structure and organization of DAE ENet.

FIGUREFIGURE
FIGURE Dual-attention mechanism of DAE ENet.(A) Global attention.(B) Local attention.

FIGUREFrontiersFIGURE
FIGURE Prediction results of the trajectory for Route .(A) Model predicted results Route .(B) Prediction results of the turning phase.(C) Prediction results of straight stage.(D) Model predicted error value of Route .(E) Prediction error value of turning phase.(F) Prediction error value of straight stage.

FIGURE
FIGURE Investigation of attention weight scores in trajectory prediction for di erent factors.(A) Case of Route .(B) Case of Route .
TABLE Navigation Status and environmental situation of ship trajectory.
TABLE Comparison of model performance indicators.The bold values mean the best prediction results can be obtained under current conditions.
TABLE RMSE of cumulative combinations for di erent variables.